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Abstract 

We prove the formula C {a,b) = K {a\C {a,b)) + C {b\a,C {a,b)) + 0{l) that expresses the plain com- 
plexity of a pair in terms of prefix-free and plain conditional complexities of its components. 



The well known formula from Shannon information theory states that H{£,,rj) = H{^) +H{r\\^). 
Here ^ , Tj are random variables and H stands for the Shannon entropy. A similar formula for algorithmic 
information theory was proven by Kolmogorov and Levin i5j and says that 

C[a,b) = C{a) + C{b\a)+0(\ogn), 

where a and b are binary strings of length at most n and C stands for Kolmogorov complexity (as defined 
initially by Kolmogorov p); now this version is usually called plain Kolmogorov complexity). Informally, 
C(m) is the minimal length of a program that produces u, and C(m|v) is the minimal length of a program 
that transforms v to u; the complexity C (m, v) of a pair (m, v) is defined as the complexity of some standard 
encoding of this pair 

This formula implies that /(a ; b) — I{b : a) + (9(log«) where I{u : v) is the amount of information in 
u about V defined as C(v) — C(v|m); this property is often called "symmetry of information". The term 
(9(logn), as was noted in f5^, cannot be replaced by (9(1). Later Levin found an (9(l)-exact version of this 
formula that uses the so-called prefix-free version of complexity: 

K{a,b)^K{a)+K{b\a,K{a)) + 0{l); 

this version, reported in fT\ , was also discovered by Chaitin 1 1 ] . In the definition of prefix-free complexity 
we restrict ourselves to self-delimiting programs: reading a program from left to right, the interpreter 
determines where it ends. See, e.g., 17 ] for the definitions and proofs of these results. 

In this note we provide a somewhat similar formula for plain complexity (also with (9(l)-precision): 

Theorem 1. 

C{a,b)^K{a\C {a,b)) + C {b\a, C {a,b)) + 0(1) . 



Proof. The proof is not difficult after the formula is presented. The <-inequality is a generalization of the 
inequality C{x,y) <K{x) + C{y) and can be proven in the same way. Assume that p is a self-delimiting 
program that maps C (a, b) to a, and ^ is a (not necessarily self-delimiting) program that maps a and C{a,b) 
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to b. The natural idea is to concatenate p and q; since p is self-delimiting, given pq one may find where p 
ends and q starts, and then use p to get a and ^ to get b. However, this idea needs some refinement: in both 
cases we need to know C{a,b) in advance; one may use the length of pq as a replacement for it, but since 
we have not yet proven the equality, we have no right to do so. 

So more caution is needed. Assume that the <-inequality is not true and C{a,b) exceeds (a | C{a,b)) + 
C {b\a,C (a,b)) by some d. Then we can concatenate prefix-free description d of d (that has length 
0(logt/)), then p and then q. Now we have enough information: first we find d, then C {a,b) — \p\ + \q\+d, 
then fl, and finally b. Therefore C{a,b) does not exceed 0{logd) + \p\ + \q\ +0(1), therefore d < 
0{logd) + (9(1) and d ~ 0(1). The <-inequality is proven. 

Let us prove the reverse inequality. In this proof we use the interpretation of prefix-free complexity 
as the logarithm of a priori probability (see, e.g., Q for details). If n = C{a,b) is given, one can start 
enumerating all pairs {x,y) such that C{x,y) < n; there are at most 2"+' of them and the pair {a. b) is 
among them. For fixed x, for each pair {x,y) in this enumeration we add 2^"^' to the probability of x; in 
this way we approximate (from below) the semimeasure P(jt:|n) = Nx2^"^^. Therefore, we get an upper 
bound for (a I n): 

K{a\n) < -logP(fl|«) + 0(l) = «-log2A^„ + 0(l), 

where Na is the number of y's such that C{a,y) < n. On the other hand, given a and n, we can enumerate 
all these y, and b is among them, so b can be described by its ordinal number in this enumeration, therefore 

C(b\a,n)<\og2Na + 0{\). 

Summing these two inequalities, we get the desired result. □ 

We can now get several known (9(l)-equalities for complexities as corollaries of Theorem[Tl 

• Recall that C(a,C(fl)) ~ C{d), and K{a,K{a)) ~ K{a) (the (9(l)-additive terms are omitted here 
and below), since the shortest program for a also describes its own length. 

• For empty b we get C{a) ^K{a\C (a)), see also |l3]|6l. 

• For empty a we get C{b) ~ C{b\C{b)), see also |l3]|6l. 

• The last two equalities imply that C{u\ C («)) =K{u\C (m)). 

The direct proof for last three statements is also easy. To show that C{a) < C{a \ C{a)), assume that 
some program p maps C (a) to a and is d bits shorter than C (a). Then we add a prefix d of length 
0{logd) that describes d in a self-delimiting way, and note that dp determines first C{a) and then 
a,sod< 0{logd) + 0{l) and li = 0(1). To show (hat K {a \ C (a)) < C(fl| C(fl)) we note that in the 
presence of C (a) every program of length C {a) can be considered as a self-delimiting one, since its 
length is known. 

Levin also pointed out that C{a) can be defined in terms of prefix-free complexity as a minimal / 
such that K{a\i) < i. (Indeed, for / = C {a) both sides differ by 0(1), and changing right hand side 
by d, we change left hand side by 0{logd), so the intersection point is unique up to 0(l)-precision. 
In other terms, K{a\i) = / + 0(1) implies C{a) — i + 0{l).) 

• More generally, we may let a be some fixed computable function of b: if a = f{b), we get C (b) = 
K{f{b)\C{b)) + C{b\f{b)X(b)). 

One can also see that Theorem [T] can be formally derived from Levin's results mentioned above. To 
show that 

C{b\a,C {a,b)) — C {a,b) — K{a\C {a,b)) 

we need to show that the right hand side i — C{a,b)~K{a\C{a,b)) satisfies the equality /r(fe| a, C (a, fo),/) = 
/ with 0(l)-precision, which implies C {b\a,C {a,b)) — i. (We omit all 0(l)-terms, as usual.) In the con- 
dition of the last inequality we may replace ihy K{a\C{a,b)) since C{a,b) is already in the condition. 
Therefore, we need to show that 

K{b\a,C{a,b),K{a\C{a,b)))^C{a,b)~K{a\C{a,b)) 
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or 

K {b\a,C {a,b),K {a\C {a,b))) + K {a\C {a,b)) = C{a,b). 

But the sum in the left hand side equals K {a,b\C {a,b)) due to the formula for prefix complexity of a pair 
{a,b) relativized to the condition C{a,b), and it remains to note that K {a,b\C {a,b)) — C{a,b). (This 
alternative proof was suggested by Peter Gacs.) 
We can obtain a different version of Theorem[T] 

Proposition 1. 

C{a,b)^K{a\C{a,b)) + C{b\a,K{a\C{a,b)))+0{l). 

Proof. Indeed, the <-inequality can be shown in the same way as the <-inequality in the proof of Theo- 
rem[Tl hence it remains to show the >-inequality. Let p be a program of length C {b\a,C {a,b)) that com- 
putes b given a and C {a,b). (The program p is not assumed to be self-delimiting.) Knowing p, we can also 
compute b given a and K{a\C {a,b)). First, we compute \p \ + K {a\C {a,b)), and this sum equals C{a,b) 
(Theorem[T]). Then, using a again, we compute b. Hence C {b\a,C {a,b)) > C {b\a,K{a\ C{a,b))). □ 

One may complain that Theorem [T] is a bit strange since it uses prefix-free complexity in one term 
and plain complexity in the second (conditional) part. As we have already noted, one cannot use C in 
both parts: C{a,b) can exceed even C{a) + C{b) by a logarithmic term. One may then ask whether it 
is possible to exchange plain and prefix-free complexity in the two terms we have and prove that C{a,b) 
equals something like 

C{a\C{a,b))+K{b\a,C{a,b)). 

It turns out that it is not possible: even the inequality C (a,/?) <C{a)+K{b\a) + 0{l)is not true. At first it 
seems that one could concatenate a self-delimiting program q that produces b given a and a (plain) program 
p that produces a, in the hope that the endpoint of q can be reconstructed, and then the rest is p. However, 
this idea does not work: the program q is self-delimiting only when a is known; to know a we need to have 
p, and to know p we need to know where q ends, so there is a vicious circle here. 

Let us show that the problem is unavoidable and that for infinitely many pairs {x,y) we have 

C{x,y) > C{x)+K{y\x)+\ogn-2\oglogn-0{l), 

where n — \x\ + \y\ is the total length of both strings. To construct such a pair, let n — 2*^ for some k, and 
choose a string r of length n and a natural number / < n such that C (r, i) >n + log«. (For every «, there 
are n2" pairs (r, /), so one of them has high complexity.) 

Letx = ri ... ri andy = r,+i . . . r„. Note that C {x,y) — C{r,i) > n+logn and that C{x) < i. Furthermore, 
K{y\x) < K{y\x,n) +K{n). Here K(y\x,n) < \y\ — n ~ i, since x and n determine \y\ and K{y \ \y\) < \y\; 
on the other hand, K (n) < 2 log log nj^ 

There is still some chance to get a formula for the plain complexity of a pair {x,y) that involves only 
plain complexities, assuming that we add some condition in the left hand side, i.e., to get some formula 
of the type C{a,b\l) =1. Unfortunately, the best result in this direction that we managed to get is the 
following observation: 

Proposition 2. For all x,y there exists a {unique up to 0(1) -precision) pair {k,V) such that C{x\l) = k, 
C {y\x^k) — I. For such a pair we have C{x\l) — k, C {y\x,k) = I and this implies C{x,y\k,l) = C{x,y\k) = 
C{x,y\l) = I + k (all with 0{\)-precision). 

Proof. The pair in question is a fixed point of F : {k^l) ^ {C{x\l),C{y\x,k)). It exists and is unique since 
F maps points at distance d into points at distance 0(\ogd). (Here "distance" means geometric distance 
between points in I? .) 

'As a byproduct of this example and the discussion above we conclude that K{x\y) cannot be defined as minimal prefix-free 
complexity of a program that maps y to x: the value K (y\x) can be smaller than min {K{p) : U{p,x) = y}, where U is the universal 
function. Indeed, in this case we would have the inequality C{x,y) < C{x) + K{y\x), since the prefix-free description of a program 
that maps x to y and a shortest description for.v can be concatenated into a description of the pair {x,y). 
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Using the relativized version of the statement C{z) — C{z\ C{z)), we conclude that C{x\k,l) — k and 
C {y\x, k,l) =1. Let us prove now that C{x,y\k,l) =k + l. Indeed, the standard proof of Kolmogorov-Levin 
theorem shows that for any x,y, k' ,1' such that 

C{x,y\k',l') <k' + l' 

we have either 

C{x\k',l') <k' or C{y\x,k',l') <l' . 

Hence if C{x\k,l) = k and C{y\x,k,l) — I for some k and I, we have C{x,y\k,l) >k + l (otherwise k and 
I can be decreased to get a contradiction). By concatenation we obtain also that C{x,y\k,l) < k + l, so 
C(x,y\k,l) — k + l (all equations with (9(l)-precision). 

It remains to show that C{x,y\k,l) — k + l implies C{x,y\k) = k + l and, similarly, C{x,y\l) = k + l. 
Indeed, a program of length k + 1 that maps {k,l) to {x,y), can be used to map k (or I) to {x,y): knowing 
the length of the program and one of the values of k and /, we reconstruct the other value. □ 

Remark 1. One can ask what can be said about pairs {k' ,1') such that C(x\l') < k' and C {y\x,k') < I'. The 
pair {k,l) given by the theorem is not necessarily coordinate-wise minimal: for example, taking a large 
k' that contains full information about y we may let I' — 0. Indeed, C (x\0) < k' (since k' is large) and 
C{y\x,k') <0 ( since k' determines y). However, to get some decrease in k' ( compared to k) or I' ( compared 
to I) we need to change the other parameter by an exponentially bigger quantity, since the information 
distance between i and i' is (9(log |/ — ;'|). The change in the other parameter should be its increase, 
otherwise we could repeat the arguments exchanging k and I and get a contradiction ( each of two changes 
could not be exponentially big compared to the other one). 
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